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Abstract 


Relax, Compensate and then Recover (RCR) is a paradigm for approximate infer¬ 
ence in probabilistic graphical models that has previously provided theoretical and 
practical insights on iterative belief propagation and some of its generalizations. 

In this paper, we characterize the technique of dual decomposition in the terms 
of RCR, viewing it as a specific way to compensate for relaxed equivalence con¬ 
straints. Among other insights gathered from this perspective, we propose novel 
heuristics for recovering relaxed equivalence constraints with the goal of incre¬ 
mentally tightening dual decomposition approximations, all the way to reaching 
exact solutions. We also show empirically that recovering equivalence constraints 
can sometimes tighten the corresponding approximation (and obtaining exact re¬ 
sults), without increasing much the complexity of inference. 

1 Introduction 

Relax, Compensate and then Recover (RCR) is a paradigm for approximate inference that is based 
on performing three steps |T). First, one relaxes equivalence constraints in a given model to obtain 
a simplified model that is tractable for exact inference. Second, one compensates for the relaxed 
equivalences by enforcing a weaker notion of equivalence. Finally, by recovering equivalence con¬ 
straints in a selective way, one can incrementally obtain increasingly accurate approximations, all 
the way to exact solutions. This paradigm is flexible enough to characterize existing algorithms 
for approximate inference, such as iterative belief propagation (IBP) J2J El SJ . Moreover, a system 
based on RCR was also successfully employed in the UAI2010 evaluation of approximate inference, 
where it was the leading system in two of the most time-constrained categories evaluated 0. 

Dual decomposition is a popular and effective approach for approximating MPE problems in prob¬ 
abilistic graphical models |6] [7] [U Q This technique has a number of desirable properties. For 
example, it provides an upper bound on the original MPE problem, which in some cases, can be 
tight. Moreover, algorithms for solving the corresponding dual optimization problem have desirable 
theoretical properties, such as monotonic improvements as in block coordinate descent algorithms. 

In this paper, we formulate dual decomposition as an instance of RCR. In particular, we view dual 
decomposition as a particular way of restoring a weaker notion of equivalence when one relaxes an 
equivalence constraint. From the viewpoint of RCR, this perspective gives rise to a new family of 
compensations with distinctive properties, such as upper bounds on MPE problems, but also upper 
bounds on the partition function. From the viewpoint of dual decomposition, this perspective (a) 

1 MPE refers to the problem finding a complete instantiation of a graphical model with maximal probability. 
This is commonly referred to as MAP as well. However, many authors reserve MAP to the problem of finding 
a partial instantiation with a maximal probability, which is a much more difficult task computationally than 
MPE. We observe this distinction between MPE and MAP in this paper. 
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gives rise to a new approach to tightening upper bounds, based on new heuristics for recovering 
equivalence constraints; (b) expands the reach of dual decomposition by allowing its application to 
other inference tasks beyond MPE; and (c) positions dual decomposition to capitalize on the vast 
literature on exact inference in addition to its classical capitalization on the optimization literature. 

Empirically, we show that the recovery of equivalence constraints in our RCR formulation of dual 
decomposition can incrementally and effectively tighten the upper bounds of dual decomposition, 
leading to optimal solutions in some cases while recovering only a few equivalence constraints, and 
without increasing much the complexity of inference. 

2 Dual Decomposition 

We first illustrate the technique of dual decomposition using a concrete example, deferring the reader 
to references such as f8| for a more general treatment. 

Consider the MRF if (A, B,C) = ifi(A, B)if2{B, C)ifs(A, C), where the goal is to find an in¬ 
stantiation a,b,c of variables A,B,C that maximizes if(a,b,c). We refer to this as the MPE 
problem. We also refer to max ai f, ]C i f(a,b,c) as the MPE value and to the maximizing instan¬ 
tiation a , b, c as an MPE instantiation. Finally, an MRF induces the probability distribution 
Pr(A, B, C ) = B, C ), where we refer to Z = b c A (a, b. c ) as the partition function. 

Dual decomposition is a technique for approximating the MPE problem, which can be described 
concretely as follows. We first clone the occurrence of each variable in each factor, leading to 
auxiliary variables A t . Bj and B2, C2 and A3, C3. We now have the fully decomposed MRF: 

if{A, B , C, Ai,Bi, f?2, C 2 , A3, C3) = t/q(Ai, Bi) if 2 (B2, C 2 ) tp3(A3, C3 ) 

eg(A, Ai) eq(A , A 3 ) eq{B , B ± ) eq(B , B 2 ) eq(C, C 2 ) eq(C, C 3 ), 

where eq(X, Xf) is an equivalence constraint. That is, eq(x, Xi) = 1 when x = x-t and eq{x , Xi) = 0 
when x 7^ Xi. Note that if {a, b, c, ai, b±, &2, C2,03, C3) = if (a, b, c) when a = a± = 0,3, b = b± = 62 
and c = C2 = C3; otherwise, if (a, b , c, 01,61,62, C2, <23, C3) = 0 . Hence, 

ma xif(a,b,c)= max if(a, b, c, ai, 61,62, C2, 03, C3). 

a,b,c a,b,c,ai,bi ,b 2 ,C 2 ,d 3 ,C 3 

The original and fully decomposed MRFs are then equivalent as far as computing the MPE value. 

We now relax the equivalence constraints (i.e., drop them), while replacing each constraint 
eq(X, Xi) by dj(X)/6j(Xi) (which is equal to one when x = xf), leading to: 


tf{A, B 1 C 1 At, Bi, f?2, C2, A3, C3) — 


lfl(Ai,Bi) lf2{B2,C2) ^3(A3, C3) 


0i(A)0 3 (A) 01 (B)e 2 {B) e 2 {C)e 3 (c) 
fl 1 (A 1 )0 3 (A 3 ) 01(^)02(52) e 2 (C 2 ) 03 (C 3 )' 


Note that if (a, b, c, ai, 61, b 2 , c 2 , a 3, C3) = if (a, b , c) when a = ai = 03, b = 61 = b 2 and c = c 2 = 
C3; otherwise, if (a, b, c, ai, b\, b 2l c 2 , a 3, C3) is incomparable to if (a, 6 , c). Hence, 

ma xif(a,b,c) < max if(a,b,c,a 1, &i, &2> C2,03, C3) 

a,b,c a,6,c,ai,£>i,£>2,C2,a,3,C3 


max0i(a)03(a) 


max 0i (6)02 (6) 
b 


max 0 2 (c) 0 3 (c) 

C 


max 


tfi{ai,h) 


0.1,bi 0i(ai)0i(6i) 


^2(62,02) 

max — ——— —- 

&2,C2 O2\O2jv2 \C 2 j 


max 


if3{a 3 ,c 3 ) 


0 - 3 X 3 03(03)03(^3) 


This is called the dual objective and is guaranteed to provide an upper bound on the MPE value, 
ma x 0j b )C if (a, 6, c), regardless of the specific values chosen for multipliers 9j{x) > 0 . However, one 
can improve the upper bound by searching for multipliers 9i{x) that minimize the dual objective. 

Minimization problems such as this one can be tackled using techniques from the optimization 
literature. For example, subgradient methods are applicable to objective functions that are not dif¬ 
ferentiable, such as the one above. They are also guaranteed to minimize the dual objective to 
optimality, with appropriate choice of step sizes. For another example, block coordinate descent 
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methods monotonically decrease the dual objective at each step, and can yield faster convergence 
rates than subgradient methods. However, they are not necessarily guaranteed to minimize the dual 
objective. See (8) for a more thorough introduction to dual decomposition, and algorithms for the 
dual optimization problem. 

3 Relax, Compensate, and then Recover 

RCR is an approximate inference framework, which is based on three steps. The first step relaxes 
equivalence constraints from the original model. The second step compensates for the relaxed equiv¬ 
alences by enforcing some weaker notion of equivalence. The third step recovers back some of the 
equivalences in an anytime fashion, with the goal of improving the approximation. The main com¬ 
putational work performed by RCR is in the compensation step, which requires exact inference on 
the relaxed model (any exact inference algorithm can be used for this purpose). The recovery step 
may also entail computational work, although this depends largely on the recovery heuristics (some 
heuristics can be computed as a side effect of the compensation step, as we show later). 

We will next illustrate the three steps of RCR using the same example discussed above. For a more 
general treatment of RCR, however, the reader is referred to (T). 

3.1 Relax 

The first step of RCR is similar to the one used by dual decomposition: We clone variables and 
introduce equivalence constraints, leading to the following model: 

B , C , Ai, Bi, B 2 , C 2 , A3, C 3 ) = f?i) tp 2 (B 2 , C2) ip 3 (A 3 , C 3 ) 

eq(A, Ai) eq(A , A 3 ) eq(B, Bf) eq{B, B 2 ) eq(C , C 2 ) eq(C, C 3 ). 

We can then relax an equivalence constraint by simply dropping it from the model. For example, 
relaxing all equivalence constraints leads to the following model, which is fully decomposed: 

B, C, A±,Bi, f?2, C 2 ,A 3 , C 3 ) = ipi(Ai, Bi) fi> 2 (B 2 , C 2 ) ip 3 (A 3 , C 3 ). 

In principle, one can relax as many constraints as one wishes—normally, until the model is discon¬ 
nected enough to be feasible for exact inference. RCR, however, typically relaxes enough equiva¬ 
lence constraints to render the model fully decomposed. It then recovers some of these constraints 
incrementally and selectively, until it runs out of time or until the model becomes too connected to 
be feasible for exact inference. More on this later. 

3.2 Compensate 

Compensating for a relaxed equivalence constraint, say, eq(A, A \), is done by adding factors 9 a, (A) 
and 9 a (Ai ) in lieu of factor eq(A, Ai), leading to the compensated model: 

il>(A, B, C , Ai, Bi, B 2 , C 2 , A3, C 3 ) = if)\{Ai 1 B 3 ) f> 2 (B 2 , C 2 ) ip 3 (A 3 , C 3 ) 

9 Ai (A) 9 a (A 1 ) eq(A 1 A 3 ) eq(B, Bf) eq(B, B 2 ) eq(C , C 2 ) eq(C, C 3 ). 

The added factors, 9 a, (A) and 9 a(Ai), are sometimes called compensation factors. Note that 
we shall omit the subscripts Xi and X when it is clear that factors 0{X ) and 9(Xi) refer to the 
compensation factors for equivalence constraint eq(X , Xf). Moreover, whenever we refer to a state 
x of variable A', we will denote the corresponding state of variable X i by x t , unless otherwise stated. 

A compensation scheme is a set of conditions on the values of compensating factors. Each compen¬ 
sation scheme leads to a class of approximations. In phrasing such conditions, we will write mpe(a) 
to denote the MPE marginal, max(, jC t/)(a, b, c). We will also write Z(a) to denote the partition 
function marginal, c b , c). 

The following is a common condition used by different RCR compensation schemes. 

Definition 1 A compensation scheme for relaxed equivalence e(j[X, X, ') satisfies pr-equivalence 
iff the distribution induced by the compensated model satisfies Pr{x) = Pr{xi) for all values x and 
their corresponding values x^ Moreover, it satisfies mpe-equivalence iff mpe(at) = mpe(a ’i) for 
all values x and their corresponding values Xi. 
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A common and powerful technique for deriving further conditions on the compensation scheme 
is based on considering a single relaxed equivalence, under some idealized situation, and finding 
out what that idealization implies. Suppose, for example, that relaxing the equivalence constraint 
eq(X, Xi) splits the model into two disconnected components, one containing variable X and an¬ 
other containing variable Xi. This idealized situation implies the following condition, which is the 
only condition that leads to exact node marginals. 

Definition 2 A compensation scheme for relaxed equivalence eq( X, X,) satisfies model-split iff the 
distribution induced by the compensated model satisfies pr-equivalence and 

d(x)d(x i ) 

On fully decomposed models, this compensation scheme leads to IBP approximations 01, and 
further the Bethe free energy approximation of the partition function fl 0 l l 4 l. 


3.3 Finding compensations 

The main computational work performed by RCR is in finding compensations that satisfy some 
stated conditions. This is usually done by deriving a characterization of the compensation, which 
yields fixed-point iterative equations. For example, compensations that satisfy model-split have been 
characterized as follows If 3 l . 


Theorem 1 A compensation scheme for relaxed equivalence constraint eq(X,Xi ) satisfies model- 
split iff the partition function Z of the compensated model satisfies 


9 (x) 


Z(xj) 
a d{Xi) 


9 {x.i) = a 


z (x) 

B{x) 


( 1 ) 


for all states x , and their corresponding states Xi. Here, a is an arbitrary normalizing constant. 


This theorem identifies update equations which form the basis of an iterative fixed-point algorithm 
that searches for model-split compensations^ In fact, the message-passing updates of IBP are pre¬ 
cisely the fixed-point iterative updates implied by Equation|T|lf 3 l. 


3.4 Recover 

RCR typically relaxes enough equivalence constraints to yield a fully decomposed model. It then 
recovers equivalence constraints incrementally and selectively, until it runs out of time or the model 
becomes too connected to be feasible for exact inference. The recovery process is based on a heuris¬ 
tic, called a recovery heuristic, that tries to identify the constraints whose relaxation has been most 
damaging to the quality of an approximation. 

A number of recovery heuristics have been proposed previously. One of these heuristics is based 
on mutual information 0 and is designed for the use with the compensation scheme that satisfies 
model-split. Another heuristic was used by RCR at the UAT 10 approximate inference evaluation 0 
[Q, which was critical to the performance (and success) of RCR in that evaluation. 

Combining recovery, with compensations that satisfy model-split, yields approximations that corre¬ 
spond to iterative joingraph propagation (IJGP) approximations [ 13lfl4l |3jn 


4 A New Compensation Scheme: Dual Decomposition 

We will now consider a new compensation scheme for RCR, which gives rise to dual decomposition 
approximations of Section 0 when the inference task of RCR is that of computing MPE. 

We start with the following family of compensation schemes. 

2 The required quantities correspond to partial derivatives, which can be computed efficiently in traditional 
frameworks for inference I lTl. 112 H . 

’Similar characterizations and generalizations of IBP have been shown in [ 15111611 X 71 . 
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Definition 3 A compensation scheme for relaxed equivalence eq{X, X,) satisfies upper-bound W 
9(x)9(Xi) = 1, for all values x and their corresponding values Xi . (2) 


The above condition leads to the following interesting guarantee. 

Theorem 2 A compensation scheme that satisfies upper-bound leads to a compensated model whose 
partition function is an upper bound on the exact partition function, and whose MPE value is an 
upper bound on the exact MPE value. 

Combining the upper-bound condition with pr/mpe-equivalence leads to a compensation scheme 
that characterizes and generalizes dual decomposition approximations, as we show next. 

Definition 4 A compensation scheme satisfies pr-dd iff it satisfies upper-bound and pr-equivalence. 
Moreover, it satisfies mpe-dd iff it satisfies upper-bound and mpe-equivalence. 


The following theorem provides a characterization of the pr-dd and mpe-dd compensation schemes, 
which can be used to search for compensations in fully decomposed models. 


Theorem 3 For a single equivalence constraint eq(X. AT), a compensation scheme satisfies pr-dd 
iff for all values x, and their corresponding values Xj. the compensated model satisfies 


9{x) = 


Z(xi)/9(xj) 

Z(x)/9(x) 


9(xi) = 


Z(x)/9{x) 
Z{xi)/9{xi) 


(3) 


The scheme satisfies mpe-dd iff it satisfies the above condition with mpe(.) substituted for Z{.). 


There is one subtlety about the above theorem, in comparison to Theorem Q] The equation given 
in this theorem can be used as an update equation only when variables X and Xi are independent 
in the compensated model (otherwise, the left-hand side will depend on the right-hand side). When 
the compensated model is fully decomposed, this condition is met (after taking into account the 
division of the compensating factors from the partition function marginals). More generally, when 
relaxing the equivalence constraint eq(X, Xi) splits the model into two disconnected components, 
one containing X and the other containing X,, the condition is also met. 

In fully decomposed models, one can use the above update equation to search for compensations that 
satisfy pr-dd or mpe-dd, in the same way that Equation [I] can be used to search for compensations 
that satisfy model-split (see Section [iOl ). We actually have a stronger result. 


Theorem 4 When the compensated model is fully decomposed, the fixed-point iterative updates of 
Equation\ 3 \correspond precisely to the block coordinate descent updates of the sum-product and 
max-sum diffusion algorithms, respectively. 


This theorem has the following main implication: When computing MPE using RCR with an mpe-dd 
compensation scheme, one obtains approximations that correspond precisely to those computed by 
the dual decomposition technique of Section[2](assuming a fully decomposed model). In particular, 
the MPE computed using RCR corresponds precisely to one computed at a fixed-point of a block 
coordinate descent algorithm such as max-sum diffusion mmm. 

We finally point out that the fixed-point iterative algorithm suggested by Equation [3] also inherits 
properties that make block coordinate descent algorithms so popular, such as monotonic improve¬ 
ments of the approximation (i.e., MPE value or partition function), when equivalence constraints are 
updated one at a time fT8ll . 


5 New Recovery Heuristics for Dual Decomposition 

Our main result thus far is that the dual decomposition technique for computing MPE corresponds 
to an instance of RCR in which (a) enough equivalence constraints are relaxed to yield a fully 
decomposed model and (b) the relaxed equivalences are compensated using the mpe-dd condition. 
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This, however, corresponds to the degenerate case of RCR. One can obtain much better approxima¬ 
tions by recovering some of the relaxed equivalence constraints, which can be done incrementally 
and selectively. In the general RCR framework, this recovery process usually continues until one 
runs out of time or until the model is too connected to be accessible to exact inference (which is 
needed to search for compensations). As we show in the next section, however, this process can 
actually terminate much earlier, as we may be able to detect when the computed MPE is exact. 

In this section, however, we will focus our attention on two tasks. First, we design heuristics for re¬ 
covering equivalence constraints in the context of pr-dd and mpe-dd compensation scheme. Second, 
we identify a more general update equation than the one of Theorem[3] which, as mentioned earlier, 
is only applicable in restricted settings. Such an update equation is necessary if we were to search 
for compensations in a model that is not fully decomposed. 

Theorem 5 For a single equivalence constraint eq{X 1 Xi), with binary variables X and Xi, a 
compensation scheme satisfies pr-dd iff the compensated model satisfies 


9{x) 

(Z(x,xi)/9(x)9(xi)\ 2 

9{xi) 

( Z{x, 

Xi)/9(x)9(xi)\ 2 

9{x) 

\Z(x,Xi)/9{x)9{xi)) 

9(xi) 

\z(x, 

Xi)/9(x)9(xi)) 


The scheme satisfies mpe-dd iff it satisfies the above condition with mpe(.) substituted for Z(.). 

There are two differences between Equation H] and the earlier Equation^ First, the new equation is 
applicable even when variables X and Xi are not independent in the compensated model. Hence, we 
can use this equation to implement a fixed-point iterative algorithm that searches for compensations 
in any modelo Second, the new equation is restricted to binary variables as we have yet to derive 
a version of this for multi-valued variables. Similar to Equation [3] however, the new equation 
monotonically improves the approximation, when equivalence constraints are updated one at a time. 

We now turn our attention to recovery heuristics. Our first observation is as follows: One can 
efficiently compute the exact effect of recovering a single equivalence constraint on the quality of 
an approximation (i.e., partition function or MPE value). In particular, the improvement due to 
recovering a single equivalence constraint can be computed as a side effect of the fixed-point update 
by Equation ldfl Thus, our first recovery heuristic imposes no additional overhead as we can compute 
the exact impact of recovering each equivalence constraint during the compensation phaseQ 

This first heuristic, however, may not distinguish each equivalence constraint sufficiently (many 
constraints may have the same impact upon recovery). Thus, we propose a secondary recovery 
heuristic which is specific to mpe-dd and motivated as follows. Given a current model, suppose that 
the recovered MPE instantiation is x and has value m. In general, m is only an upper bound on the 
exact MPE value as instantiation x may violate some relaxed equivalence constraints, eq(X, Xi )— 
that is, instantiation x may set X and Xi to different values. However, if instantiation x does 
not violate any of the relaxed equivalence constraints, then m must be the exact MPE value. Our 
secondary recovery heuristic will therefore recover those equivalence constraints that are currently 
violated by the instantiation x. By recovering such equivalence constraints, we hope to reduce the 
number of violated equivalence constraints in our approximate MPE instantiation, and thus hope to 
recover an exact MPE instantiation; cf. reducing the duality gap as in ED- 

Consider, in contrast, the “recovery” heuristic suggested by lfl9l . which introduced local consistency 
constraints to tighten a linear programming (LP) relaxation that corresponds to the dual objective 
of dual decomposition. This heuristic sought to tighten an outer bound on the marginal polytope, 
which would normally require exponentially many linear constraints in an LP that would exactly 
solve an MPE problem. The “recovery” heuristic suggested by ||T9| , introduces local consistency 
constraints over triplet clusters, which was particularly effective at solving challenging classes of 
MPE problems, such as protein design problems lf20l . However, introducing triplet constraints by 

4 In our implementation, we simply set 6 (x) = 6{xi) = 1. 

5 The partition function after recovering a single constraint eq(X, Xi) is Pfl . Moreover, 

the MPE value after recovering the constraint is max{ }• 

'’Note, however that subsequent fixed-point updates for other equivalence constraints will in principle in¬ 
validate the measured impacts of previous constraints. On the other hand, computing this impact requires 
computations that would allow us to perform an update anyways. 
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themselves may not be sufficient to completely tighten the dual bound, and otherwise, there are 
exponentially many local consistency constraints available to choose from. In contrast, the RCR 
recovery process yields an incremental and full spectrum of approximations, leading up to exact 
inference when all equivalence constraints have been recovered. Thus, we view RCR recovery as 
a complementary approach to the techniques of ED, when triplet constraints are not sufficient to 
extract the exact MPE solution. 


6 An Empirical Perspective 

We evaluate our new recovery heuristics based on their ability to extract an exact MPE solution for a 
given probabilistic graphical model. In our first set of experiments, our goal is to illustrate that RCR 
can obtain an exact MPE solution by recovering equivalence constraints, without impacting much the 
complexity of inference. For our second set of experiments, we compared RCR with MPLP in their 
ability to find exact MPE solutions based on their respective approaches to tightening a relaxation, 
which is by adding triplet clusters in the case of MPLP fl9ll FT Our goal here is to illustrate that 
recovering equivalence constraints can also be a viable option for models where introducing triplet 
clusters alone is not sufficient to tighten the dual objective of dual decomposition. 

For RCR, starting with a fully decomposed model, we iteratively recover 5 equivalence constraints 
at a time, as described in the previous section. For MPLP, we used the default settings, which 
introduced 5 triplet clusters at a time. RCR was set, as MPLP was, to run for at most 1000 iterations, 
before recovering equivalence constraints and introducing triplet clusters. 

As the RCR approach requires only a black-box inference engine to execute its compensation phase 
(which requires only marginals, or alternatively, partial derivatives), we can take advantage of state- 
of-the-art systems for exact inference. This includes advanced approaches for inference based on 
arithmetic circuits (ACs), which can effectively exploit local structure | 2Tl[22l . We use such an in¬ 
ference engine for our experiments, although the benchmarks that we considered do not necessarily 
have much local structure. Using arithmetic circuits, we can also more efficiently compute quantities 
such as mpe(x,Xi)/9(x)9(xi ) via lazy evaluation in an arithmetic circuit li23l . 

We first performed experiments on 50 randomly parameterized grid models, which we generated 
using MPLP with default parameters, but assuming binary variables. The resulting 10 x 10 grids 
corresponded to pairwise MRFs with mixed attractive and repulsive couplings. The following table 
summarizes the number of equivalence constraints (out of 360 relaxed) that needed to be recovered 
for RCR to obtain an optimal MPE solution, and the corresponding complexity of inference (on 
average). Note that the complexity of inference using arithmetic circuits is linear in the size of the 
AC, i.e., the number of nodes and edges in the resulting circuit. 


edges recovered 

91-120 

121-150 

151-180 

181-210 

211-240 

241-270 

271-300 

301-330 

% instances 
% increase in AC size 

4% 

88.11% 

16% 

93.58% 

12% 

89.31% 

18% 

103.17% 

24% 

100.43% 

12% 

113.78% 

6% 

195.41% 

8% 

308.39% 


Observe that RCR was able to recover up to 240 equivalence constraints, and solve 74% of all MPE 
problems, without increasing much—even decreasing in many cases—the complexity of inference. 
Note that we start with a fully decomposed approximation, and it is easily possible to recover many 
equivalence constraints without impacting much the treewidth of a model (it is possible to recover 
200 and only obtain a spanning tree). Moreover, AC size can decrease since there are fewer compen¬ 
sating factors to maintain. MPLP is also effective on this benchmark, where it can introduce square 
clusters into its relaxation m, although such a technique is restricted to grids. 

We next performed experiments on Bayesian networks induced from haplotype data (over 201 binary 
variables), which are networks with bounded treewidth l24l . These networks do not necessarily 
have as regular a structure that can suggest a natural way of introducing clusters, such as in grids. 
Moreover, note that triplet clusters alone may not be sufficient to tighten the dual objective, i.e., to 
close the duality gap. In these benchmarks, there were 69 models, of which 13 models were cases 
where MPLP failed to find the optimal MPE solution, given 1000 attempts to tighten its relaxation 


7 A public version of MPLP is available at http ://cs. nyu . edu/ 'dsontag/ In our second set of 
experiments, we used an updated implementation of MPLP that was provided to us by the authors of ED- 
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Figure 1: Recovering triplet clusters and equivalence constraints in MPLP (left) and RCR (right). 
Solid lines indicate the value of the dual objective (upper bound), dashed lines indicate the value of 
the current best assignment (lower bound), and the dotted line denotes the optimal MPE solution. 


(i.e., to introduce local consistency constraints). In contrast, RCR was able to obtain the optimal 
MPE solution in all cases, after recovering a small number of equivalence constraints. 

Figure Q] illustrates an example run of both MPLP and RCR, in a model where MPLP failed to find 
an optimal MPE solution. For the case of MPLP, one observes that MPLP starts to tighten the gap 
between its upper and lower bounds, but fails to tighten it further after some number of iterations. 
In fact, for this particular model, MPLP fails to find triplet clusters to introduce into its relaxation. 
On the other hand, RCR obtains the optimal solution after recovering only 70 of 451 equivalence 
constraints. When we look at the arithmetic circuits used to do inference in our simplified model, the 
size goes down from 38555 to 36729 nodes and edges after recovering 70 equivalence constraints. 

In the following table, we summarized the number of recovered equivalence constraints needed to 
obtain an optimal solution, and the complexity of inference, for the two cases: 



# of models 

avg. % recovered 

avg. % increase in AC size 

MPLP did not solve 

13 

26.93% 

124.97% 

RCR and MPLP solved 

56 

3.56% 

99.65% 


In the models that were left unsolved by MPLP, RCR was able to find exact MPE solutions by 
recovering only a quarter of the relaxed equivalence constraints, on average. This came with only a 
modest increase in the complexity of inference, i.e., AC size. In the models solved by both MPLP 
and RCR, very few equivalence constraints needed to be recovered on average, and in fact led to a 
very slight decrease in the complexity of inference. 

We finally remark that the second set of experiments involved models that are not necessarily well 
suited for recovering triplet clusters with MPLP. Moreover, our comparisons with RCR were limited 
since we were restricted to models over binary variables (as recovery requires the use of a compen¬ 
sation algorithm like the one implied by Theorem[5] which is specific to binary variables). We plan 
more thorough empirical comparisons in future work. 


7 Conclusion 

In this paper, we formulated the technique of dual decomposition in the terms of Relax, Compen¬ 
sate and then Recover (RCR). By formulating dual decomposition in the more general terms of 
RCR, we have broadened the scope of the technique by (a) proposing new recovery heuristics for 
tightening the dual objective of dual decomposition, (b) extending it to other inference tasks, such 
as bounding the partition function (although this was not evaluated here), and (c) formulating it in 
terms that allows it to easily take advantage of the vast literature on exact inference, for the pur¬ 
poses of more effective approximate inference. Empirically, we showed how these new recovery 
heuristics can sometimes be used to obtain exact solutions to MPE problems, without increasing 
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much the complexity of inference—in particular, on problems which existing systems based on dual 
decomposition are not as well suited for. 
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A Proofs 

We first review and refine some notation and some definitions, for the purposes of our proofs. Here, 
variables are denoted by upper case letters ( X ) and their instantiations by lower case letters ( x ). 
Moreover, sets of variables are denoted by bold upper case letters (X) and their instantiations by 
bold lower case letters (x). 

An MRF ip(X), over a set of variables X, is a product of factors tfi, which induces a probability 
distribution Pr(X): 

^(X) = n^i(Xi) Pr(X) = ±tf(X). 

i 

Here, each factor 'ipfiXf) is a function mapping an instantiation x, of variables X. ( , to a non-negative 
real number. Moreover, Z = L’(x) is a normalizing constant called the partition function. 

We are interested in approximations to the partition function, and the most probable explanation 
(MPE): 

mpe = max^(x) 

X 

We refer to mpe as the MPE value, and a maximizing x as an MPE instantiation. We are also 
interested in MPE marginals mpe(;;:) and partition function marginals Z(x): 

mpe(:r) = max^(x) Z(x ) = X] t/^x) 

x|=x “' 

X\=X 

where mpe(;;:) can be interpreted as the MPE value of our model, assuming variable X takes on the 
value x\ similarly for partition function marginals. 

We may augment an MRF so that it contains factors eq(X, Y) that represent equivalence constraints 
X = Y between pairs of variables X and V in X. For the purposes of this paper, we will assume 
that equivalence constraints arise by cloning a variable X that appears in a factor ^(Xj) (although 
our results hold for equivalence constraints in general). We will denote this clone by Xj, and assume 
an equivalence constraint eq(X, X, ). We continue to denote the set of original variables by X, but 
we now denote the set of clone variables by X c . Our MRF with equivalence constraints is thus: 

v>(x,x c )=n^( x ?)- n ^ x i) 

i X=Xi 

Note that the distribution and the MPE problem (over the original variables X), as well as the 
partition function, are all invariant to the introduction of equivalence constraints, as described above. 
Moreover, whenever we refer to a state x of variable X, we will denote the corresponding state of 
the clone Xi by x t , unless otherwise stated. 

We can relax an equivalence constraint eq(X, Xi) by removing its factor from the MRF, and then 
compensate for the relaxation by introducing two unit factors 0(X) and 0(X,). Doing so, for all 
equivalence constraints, we obtain a simpler MRF and distribution 

/(X, X c ) = n V>i(X?) • n 9{X)9{Xi) Pr'(X) = ^'(X) 

i X=Xi 
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where Z' is the corresponding partition function. Note that each constraint eq(X, Xj) is associated 
with unique factors 6 {X) and 9(Xf), which we may sometimes distinguish by 9x, ( X ) and 9x (. Xi). 


Theorem 1 A compensation scheme for relaxed equivalence constraint eq(X , X, ) satisfies model- 
split iff the partition function Z' of the compensated model satisfies 


0 {x) 


Z'jxj) 
a 0 (xi) 


6{xi) = a 


Z'{x) 

0 {x) 


(1) 


for all states x , and their corresponding states Xi. Here, a is an arbitrary normalizing constant. 


Proof See 0. 


□ 


Theorem 2 A compensation scheme that satisfies upper-bound leads to a compensated model whose 
partition function is an upper bound on the exact partition function, and whose MPE value is an 
upper bound on the exact MPE value. 

Proof Consider an equivalence constraint eq(X,Xi). If variable A' is set to the value x, and its 
clone Xi is set to the corresponding value x j, then eq(x, xf) = 1 = 9{x)9(xf) for a compensation 
satisfying upper-bound. When x f Xi, we have eq{x,xf) = 0 < 9{x)9(xf). Moreover, i/j(x) = 
^'(x, x c ) if instantiation x, x c satisfies all equivalence constraints, and t/>(x) = 0 < ^'(x, x c ) when 
instantiation x, x c does not. Thus, 0 < i/j(x) < t/>'(x, x c ) for all instantiations x and x c . 

The MPE of a compensated model is thus an upper bound on the MPE of the original: 

maxtMx) = max ip'(x,x c ) < maxt/nx, x c ). 

x x.,x c :X=Xi x,x c 

Here the second maximization is constrained to assignments x, x c that satisfy all equivalence con¬ 
straints eq(X, Xi). Similarly, for the partition function: 

Z = ^2^(X)= ^'(x.x 0 ) < ^^'(x,x c ) = Z'. 


Theorem 3 For a single equivalence constraint eq(X , Xi), a compensation scheme satisfies pr-dd 
iff for all values x, and their corresponding values Xi, the compensated model satisfies 


9{x) 


f Z'(xi)/9(xi) \ 2 
l Z'(x)/9(x) ) 


9{xf) 


( Zfx)/9(x) y 
\Zfxi)/9(xi)J 


(3) 


The scheme satisfies mpe-dd iff it satisfies the above condition with mpe'(.) substituted for Z'(.). 


Proof From the definition of a pr-equivalence, we first have: 


Pr\x) 


1 dZ' 
~Z’ dO(x) 


9{x) 


1 dZ' 
~Z’ d9(xi) 


9(xi) = Prfxi) 


for all values x, and x, respectively. For a compensation satisfying upper-bound, we can substitute 
9(xi) = and solve for 9(x), giving us fixed-point conditions: 


9(x) 


fdz'/d9(xi)y 

^ dZ'/d9(x) ) 


We further remark that jpj-p is independent of the unit factor 0(x) since the partition function Z’ is 

linear in 9{x). Moreover, we can compute by when 9(x) is positive. Otherwise, partial 
derivatives can be computed efficiently in traditional frameworks for inference, as in fTTlH2l l. 


The derivation is analogous for MPE, starting from the definition of mpe-equivalence. □ 


Theorem 4 When the compensated model is fully decomposed, the fixed-point iterative updates of 
Equation\3\ correspond precisely to the block coordinate descent updates of the sum-product and 
max-sum diffusion algorithms, respectively. 
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Figure 2: On the left, is an MRF with two factors, if>i(A, C. D) and ip 2 (A, B,D). On the right, is the 
MRF found by cloning all variables, and then relaxing the 6 resulting equivalence constraints (indi¬ 
cated by dashed lines). Besides the two original factors, now over cloned variables, A ( A 1 , C\ , i) \) 
and AA 2 , B 2 , D 2 ), we now have twelve compensating factors, two each for the six equivalence 
constraint relaxed: one factor 9(X) at each of the six cloned variables X, and one factor 0(Xj) 
each for variables B and C (involved in one equivalence constraint), and two factors 6{Xi) each for 
variables A and D (involved in two equivalence constraints). 


Proof of Theorem[4] Consider an MRF found by taking each factor i/>j(X,;) and each variable X e 
X,, and then: 


1. replace variable X with a unique clone variable X,, and 

2. introduce an equivalence constraint eq(X, X, t ). 


When we relax all equivalence constraints, the resulting model is fully decomposed, where all of 
the factors r/>j(X?), now over clone variables X°, are disconnected. We add compensating factors 
Ox, (X) and 0x(X,), where X' denotes the original variable and X, for each equivalence constraint 
eq{X, X, ) relaxed. The resulting MRF, over original variables X and clone variables X c is: 


ax,x c )=[ n^(x?)j • [ n wx) ox{Xi) 

i X=Xi 

=nh* c > n • [n n 


Xj£X' 


iiXeX; 


Note that each factor ^(X?) is now associated with a unit factor 9x(Xi) for each equivalence 
constraint eq(X,Xi) that the factor was involved in: one for each X, g X). Each variable X is 
associated with a unit factor &x t (X) for each equivalence constraint eq(X. X, ) that variable X was 
involved in: one for each factor ^>,(Xj), where X g X,. Figure [2] highlights a decomposition for a 
simple MRF. 

Now, consider an equivalence constraint eq(X, X, ) in our compensated MRF A4'. Since the MRF 
is disconnected, the factor Ox,, (X ) interacts only with the compensating factors over variable X. 
Similarly, the factor 9x{Xi) interacts only with the factor T>,(X( : ), and the other compensating 
factors over the other clone variables in X,(. Thus, our partial derivatives have the following form: 


dZ’ 

d9 Xi (x) 


oc Yi ° x i ( a: ) 

r-x GXj 


dZ' 

d9 x (xi) 


« yi ^( x ?) • n 


ViGXj: 

Y^X 


Note again that we can compute the partial derivatives by when 9xi{x) is positive. 

For the MPE problem, we are interested in computing which has a form analogous to the 

above, except with maximizations instead of summations. Moreover, ^ is independent of the 
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parameter 9x t (x) (after taking into account the division). The resulting fixed-point updates for the 
log parameters, are now: 


log 9 Xi (x) = -- 


1 mpe'(x) 1 mpe'(xj) 


2 9 Xi (x) 2 0 x ( Xi ) 


2 E lo S {x) + — jpax 


4 E lo § Qxj (x) + ^ jpax 

y.xeitj • 


logV’i(x“)+ ^ log 9y{yi) 


YiE'X.f: 

Y^X 


log M x i)- E lo S (y) 


YieX-: 

YytX 


log a 


log a 


where we substitute log(9y(j/i) with — log 6V; (y), from our upper-bound condition. Here, a can 
be treated as a normalizing constant, which we can ignore, since it is canceled out in the joint 
distribution of the compensated MRF. We thus arrive at the block coordinate descent update of the 
max-sum diffusion algorithm, as in J8] Equation 1.17], □ 


Theorem 5 For a single equivalence constraint eq(X,Xi), with binary variables X and Xi, a 
compensation scheme satisfies pr-dd iff the compensated model satisfies 


9{x) 

f Z , (x,x i )/0(x)0(x i )\ 

1 2 °(Xi) 

( Z'(x,Xi)/9(x)9(xi)\ 

0 (x) 1 

{z'(x,xi)/0(x)0(xi) J 

1 9(xi) 1 

\Z , (x,x i )/9(x)9(xi) J 


The scheme satisfies mpe-dd iff it satisfies the above condition with mpe'fj) substituted for Z' {.). 


Proof First, note that: 


Z'(x, Xi) 


d 2 Z' 


(5) 


6 (x)0(xi) dd(x)dd(xi) 

which is a quantity that is independent of both of the unit factors 9{x) and 9{xf), since the partition 
function Z' is linear in 9{x ), and linear in 6 {xi). 

For binary variables X and X % we have 

1 d 2 Z' 1 d 2 Z' 

Pr'(x) = Pr\x,Xi) + Pr'(x,Xi) = ^6 >(x)6>(x a ) + — ^ Q(x)0(xi) 


Z’ dO(x)d6( Xi ) 


Z’ dO{x)dO(xi) 


1 d 2 Z' 1 d 2 Z' 

Pr (xf) = Pr'{x , Xi) + Pr'(x, Xi) = — x OI „\aar„ ^ d ( x )9{xi) + — ^ 9{x)9( y x i ) 


Z' d9{x)d9(xi) 


Z' d9{x)d6(xi) 


After substituting 9{xi) = jy— (from our upper-bounds condition), equating the above marginals, 
we get the desired result after some rearranging. □ 
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